语义细分是计算机视觉中的一个流行研究主题,并且在其上做出了许多努力,结果令人印象深刻。在本文中,我们打算搜索可以实时运行此问题的最佳网络结构。为了实现这一目标,我们共同搜索深度,通道,扩张速率和特征空间分辨率,从而导致搜索空间约为2.78*10^324可能的选择。为了处理如此大的搜索空间,我们利用差异架构搜索方法。但是,需要离散地使用使用现有差异方法搜索的体系结构参数,这会导致差异方法找到的架构参数与其离散版本作为体系结构搜索的最终解决方案之间的离散差距。因此,我们从解决方案空间正则化的创新角度来缓解离散差距的问题。具体而言,首先提出了新型的解决方案空间正则化(SSR)损失,以有效鼓励超级网络收敛到其离散。然后,提出了一种新的分层和渐进式解决方案空间缩小方法,以进一步实现较高的搜索效率。此外,我们从理论上表明,SSR损失的优化等同于L_0-NORM正则化,这说明了改善的搜索评估差距。综合实验表明,提出的搜索方案可以有效地找到最佳的网络结构,该结构具有较小的模型大小(1 m)的分割非常快的速度(175 fps),同时保持可比较的精度。
translated by 谷歌翻译
视觉变压器(VIT)在各种计算机视觉任务中的成功促进了该无卷积网络的不断增长。 VIT在图像贴片上工作的事实使其可能与拼图拼图解决的问题有关,这是一项经典的自我监督的任务,旨在重新排序洗牌的顺序图像贴片回到其自然形式。尽管它很简单,但已证明解决拼图拼图对使用卷积神经网络(CNN)(例如自我监督的特征表示学习,领域的概括和细粒度分类)的任务有帮助。在本文中,我们探索了解决拼图拼图作为图像分类的自我监督的辅助损失,名为Jigsaw-Vit。我们展示了两种修改,可以使拼图优于标准VIT:丢弃位置嵌入和随机掩盖斑块。但是很简单,我们发现拼图vit能够改善标准VIT的概括和鲁棒性,这通常是一种权衡。在实验上,我们表明,在ImageNet上的大规模图像分类中,添加拼图拼图分支比VIT提供了更好的概括。此外,辅助任务还提高了对动物-10n,食物101N和服装的嘈杂标签的鲁棒性,也可以提高对抗性示例。我们的实施可从https://yingyichen-cyy.github.io/jigsaw-vit/获得。
translated by 谷歌翻译
多视图光谱聚类(MVSC)由于多样化的数据源而引起了越来越多的关注。但是,大多数现有作品在样本外预测中被禁止,并且忽略了模型的解释性和聚类结果的探索。在本文中,通过限制内核机框架通过共享潜在空间提出了一种新的MVSC方法。通过偶联特征双重性的镜头,我们为MVSC施加了加权内核主成分分析问题,并开发了修改的加权共轭特征二重性以制定二元变量。在我们的方法中,双重变量扮演着隐藏特征的角色,所有视图都共享了构造一个常见的潜在空间,并通过从特定的空间中学习预测来耦合视图。这种潜在空间可促进分离的簇,并提供直接的数据探索,促进可视化和解释。我们的方法只需要一个单一的特征分类,其维度独立于视图数量。为了提高高阶相关性,引入了基于张量的建模而不增加计算复杂性。我们的方法可以通过样本外扩展灵活地应用,从而极大地提高了具有固定尺寸内核方案的大规模数据的效率。数值实验验证了我们的方法在准确性,效率和可解释性方面有效,显示出明显的特征值衰减和不同的潜在变量分布。
translated by 谷歌翻译
作为一种强大的建模方法,分段线性神经网络(PWLNNS)已在各个领域都被证明是成功的,最近在深度学习中。为了应用PWLNN方法,长期以来一直研究了表示和学习。 1977年,规范表示率先通过增量设计学到了浅层PWLNN的作品,但禁止使用大规模数据的应用。 2010年,纠正的线性单元(RELU)提倡在深度学习中PWLNN的患病率。从那以后,PWLNNS已成功地应用于广泛的任务并实现了有利的表现。在本引物中,我们通过将作品分组为浅网络和深层网络来系统地介绍PWLNNS的方法。首先,不同的PWLNN表示模型是由详细示例构建的。使用PWLNNS,提出了学习数据的学习算法的演变,并且基本理论分析遵循深入的理解。然后,将代表性应用与讨论和前景一起引入。
translated by 谷歌翻译
The score-based query attacks (SQAs) pose practical threats to deep neural networks by crafting adversarial perturbations within dozens of queries, only using the model's output scores. Nonetheless, we note that if the loss trend of the outputs is slightly perturbed, SQAs could be easily misled and thereby become much less effective. Following this idea, we propose a novel defense, namely Adversarial Attack on Attackers (AAA), to confound SQAs towards incorrect attack directions by slightly modifying the output logits. In this way, (1) SQAs are prevented regardless of the model's worst-case robustness; (2) the original model predictions are hardly changed, i.e., no degradation on clean accuracy; (3) the calibration of confidence scores can be improved simultaneously. Extensive experiments are provided to verify the above advantages. For example, by setting $\ell_\infty=8/255$ on CIFAR-10, our proposed AAA helps WideResNet-28 secure 80.59% accuracy under Square attack (2500 queries), while the best prior defense (i.e., adversarial training) only attains 67.44%. Since AAA attacks SQA's general greedy strategy, such advantages of AAA over 8 defenses can be consistently observed on 8 CIFAR-10/ImageNet models under 6 SQAs, using different attack targets, bounds, norms, losses, and strategies. Moreover, AAA calibrates better without hurting the accuracy. Our code is available at https://github.com/Sizhe-Chen/AAA.
translated by 谷歌翻译
数百万患者患有世界各地的罕见疾病。然而,罕见疾病的样品远小于常见疾病。此外,由于医疗数据的敏感性,医院通常不愿意分享患者信息,以引用隐私问题的数据融合。这些挑战使传统的AI模型难以提取疾病预测目的的稀有疾病特征。在本文中,我们通过提出基于联邦荟萃学习的稀有疾病预测的新方法来克服这种限制。为了提高稀有疾病的预测准确性,我们设计了一种基于关注的元学习(ATML)方法,根据基础学习者的测量培训效果,动态调整对不同任务的关注。另外,提出了一种基于动态权重的融合策略,以进一步提高联合学习的准确性,这基于每个本地模型的准确性动态选择客户端。实验表明,随着五次镜头,我们的方法以准确性和速度为原始联合元学习算法进行了出差。与每个医院的本地模型相比,所提出的模型的平均预测精度增加了13.28%。
translated by 谷歌翻译
深度神经网络(DNN)被视为易受对抗性攻击的影响,而现有的黑匣子攻击需要广泛查询受害者DNN以实现高成功率。对于查询效率,由于它们的梯度相似度(GS),即代理的攻击梯度与受害者的攻击梯度类似,因此使用受害者的代理模型来生成可转移的对抗性示例(AES)。但是,通常忽略了它们对输出的相似性,即预测相似性(PS),以在不查询受害者的情况下通过代理过滤效率低效查询。要共同利用和还优化代理者的GS和PS,我们开发QueryNet,一个可以显着减少查询的统一攻击框架。 Querynet通过多识别代理人创造性地攻击,即通过不同的代理商为一个样本工艺几个AES,并且还使用代理人来决定查询最有前途的AE。之后,受害者的查询反馈累积以优化代理人的参数,还可以优化其架构,增强GS和PS。虽然Querynet无法获得预先接受预先训练的代理人,但根据我们的综合实验,它与可接受的时间内的替代方案相比,它会降低查询。 ImageNet,只允许8位图像查询,无法访问受害者的培训数据。代码可在https://github.com/allenchen1998/querynet上获得。
translated by 谷歌翻译
量化已成为压缩和加速神经网络最普遍的方法之一。最近,无数据量化已被广泛研究作为实用和有前途的解决方案。它根据FP32批量归一化(BN)统计,合成校准量化模型的数据,并显着降低了传统量化方法中实际训练数据的沉重依赖性。不幸的是,我们发现在实践中,BN统计的合成数据在分配水平和样品水平上具有严重均匀化,并且进一步引起量化模型的显着性能下降。我们提出了各种样品生成(DSG)方案,以减轻均质化引起的不利影响。具体而言,我们松弛BN层中的特征统计的对准,以在分配水平处放宽约束,并设计一个层状增强,以加强针对不同的数据样本的特定层。我们的DSG方案是多功能的,甚至能够应用于现代训练后的训练后的量化方法,如亚马逊。我们评估大规模图像分类任务的DSG方案,并始终如一地获得各种网络架构和量化方法的显着改进,特别是当量化到较低位时(例如,在W4A4上的高达22%)。此外,从增强的多样性受益,综合数据校准的模型均接近通过实际数据校准的那些,甚至在W4A4上越优于它们。
translated by 谷歌翻译
As one of the most important psychic stress reactions, micro-expressions (MEs), are spontaneous and transient facial expressions that can reveal the genuine emotions of human beings. Thus, recognizing MEs (MER) automatically is becoming increasingly crucial in the field of affective computing, and provides essential technical support in lie detection, psychological analysis and other areas. However, the lack of abundant ME data seriously restricts the development of cutting-edge data-driven MER models. Despite the recent efforts of several spontaneous ME datasets to alleviate this problem, it is still a tiny amount of work. To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7,526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years. Afterwards, we adopt four classical spatiotemporal feature learning models on DFME to perform MER experiments to objectively verify the validity of DFME dataset. In addition, we explore different solutions to the class imbalance and key-frame sequence sampling problems in dynamic MER respectively on DFME, so as to provide a valuable reference for future research. The comprehensive experimental results show that our DFME dataset can facilitate the research of automatic MER, and provide a new benchmark for MER. DFME will be published via https://mea-lab-421.github.io.
translated by 谷歌翻译
Interview has been regarded as one of the most crucial step for recruitment. To fully prepare for the interview with the recruiters, job seekers usually practice with mock interviews between each other. However, such a mock interview with peers is generally far away from the real interview experience: the mock interviewers are not guaranteed to be professional and are not likely to behave like a real interviewer. Due to the rapid growth of online recruitment in recent years, recruiters tend to have online interviews, which makes it possible to collect real interview data from real interviewers. In this paper, we propose a novel application named EZInterviewer, which aims to learn from the online interview data and provides mock interview services to the job seekers. The task is challenging in two ways: (1) the interview data are now available but still of low-resource; (2) to generate meaningful and relevant interview dialogs requires thorough understanding of both resumes and job descriptions. To address the low-resource challenge, EZInterviewer is trained on a very small set of interview dialogs. The key idea is to reduce the number of parameters that rely on interview dialogs by disentangling the knowledge selector and dialog generator so that most parameters can be trained with ungrounded dialogs as well as the resume data that are not low-resource. Evaluation results on a real-world job interview dialog dataset indicate that we achieve promising results to generate mock interviews. With the help of EZInterviewer, we hope to make mock interview practice become easier for job seekers.
translated by 谷歌翻译